Framework and Resources for Natural Language Parser Evaluation

نویسنده

  • Tuomo Kakkonen
چکیده

Because of the wide variety of contemporary practices used in the automatic syntactic parsing of natural languages, it has become necessary to analyze and evaluate the strengths and weaknesses of different approaches. This research is all the more necessary because there are currently no genreand domain-independent parsers that are able to analyze unrestricted text with 100% preciseness (I use this term to refer to the correctness of analyses assigned by a parser). All these factors create a need for methods and resources that can be used to evaluate and compare parsing systems. This research describes: (1) A theoretical analysis of current achievements in parsing and parser evaluation. (2) A framework (called FEPa) that can be used to carry out practical parser evaluations and comparisons. (3) A set of new evaluation resources: FiEval is a Finnish treebank under construction, and MGTS and RobSet are parser evaluation resources in English. (4) The results of experiments in which the developed evaluation framework and the two resources for English were used for evaluating a set of selected parsers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Resources for Turkish morphological processing

We present a set of language resources and tools—a morphological parser, a morphological disambiguator, and a text corpus—for exploiting Turkish morphology in natural language processing applications. The morphological parser is a state-of-the-art finite-state transducer-based implementation of Turkish morphology. The disambiguator is based on the averaged perceptron algorithm and has the best ...

متن کامل

Automatic Evaluation of Parser Robustness: Eliminating Manual Labor and Annotated Resources

This article describes an automatic evaluation procedure for NLP system robustness under the strain of noisy and ill-formed input. The procedure requires no manual work or annotated resources. It is language and annotation scheme independent and produces reliable estimates on the robustness and accuracy of NLP systems. The procedure was applied to five parsers and one part-of-speech tagger on S...

متن کامل

Combining Language Resources Into A Grammar-Driven Swedish Parser

This paper describes work on a rule-based, open-source parser for Swedish. The central component is a wide-coverage grammar implemented in the GF formalism (Grammatical Framework), a dependently typed grammar formalism based on Martin-Löf type theory. GF has strong support for multilinguality and has so far been used successfully for controlled languages (Angelov and Ranta, 2009) and recent exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/0712.3705  شماره 

صفحات  -

تاریخ انتشار 2007